26 July 2019

Proudly supported by the
people of Western Australia
through Channel 7's Telethon

A Research Compendium

Definition

We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, …), and as a means for distributing, managing and updating the collection.

Gentleman, R. and Temple Lang, D. (2004)
  • Convention for how you organise your research artefacts into directories
  • A standard and easily recognisable way for organising a reproducible research project
  • Simplifies file management and streamlines analytical workflows
  • Ideal for projects that result in the publication of a paper
  • Easier to communicate your work with other researchers (and your future self)
  • “Project as a Package”
  • R and beyond

 

Definition

We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, …), and as a means for distributing, managing and updating the collection.

Gentleman, R. and Temple Lang, D. (2004)
  • Convention for how you organise your research artefacts into directories
  • A standard and easily recognisable way for organising a reproducible research project
  • Simplifies file management and streamlines analytical workflows
  • Ideal for projects that result in the publication of a paper
  • Easier to communicate your work with other researchers (and your future self)
  • “Project as a Package”
  • R and beyond

 

Example Project Directory

The Container

Example Template Project

Project meta data

  • README.md
  • a synopsis of the project
  • very useful for your future self and colleges/collaborators
  • NEWS.md
  • Communicate change to the project files/data
  • DESCRIPTION (R specific - Python requirements.txt)
  • LICENSE
  • .gitIgnore (Will talk about Git later)

Project Administration

Project Administration

admin *

  • Project meta data
  • Legal documents (e.g. contracts)
  • Communications
  • Ethics documentation
  • Indirect resources (e.g. papers)
  • Project management resources
  • etc.

* Consider adding directories/files marked with an asterisk to .gitIgnore

archive *

  • Old code that might be redundant, but is precious
  • Helps to keep project folders tidy

Data Directories

Data Directories

data-raw *

  • A “read-only” directory to store raw data (e.g. Excel, STATA, SAS, .csv)
  • Recommend changing the file to read only in operating system
  • The heart of reproducible research, all actions are traceable from raw data to report

data *

  • Data is converted from the raw format in to the software’s preferred structure (e.g. .RData, .rds)
  • Minimal operations are performed on data
    • correcting variable names
    • data cleaning
    • data harmonisation/standardisation/coding
    • table joining/splitting

cache *

  • Store manipulated data here for statistical analysis

Script/Algorithm Directories

Script/Algorithm Directories

ProjectTemplate Package

munge

Verb (used with or without object)

To manipulate (raw data), especially to convert (data) from one format to another:

www.dictionary.com
  • A ProjectTemplate directory
  • Order numerically in the order the scripts should be run:
    • 01-munge.R
    • 02-Lexis.R

R

(src, py, cpp, or whatever)

  • 00-cleaner.R - read in raw data and save in data
  • 01-main.R - main script file
  • 99-helper.R - custom functions to be read in at start of script
  • data.R - How you can create a data dictionary in R!

Other Directories

  • config
  • ProjectTemplate configuration files, e.g.:
    • Auto-munge
    • Auto-load packages
    • Auto-import cache/data
  • inst
  • Auto generated files used by some package-related functions
  • Vignettes

More Directory Ideas

  • shiny
  • docker
  • logs
  • diagnostics
  • man/doc
  • graphs
  • tests
  • reports/vignettes

Reporting

Biometrics R Markdown Templates

Reporting

Markdown

Markdown is a lightweight mark-up language with plain text formatting syntax that allows it to be converted to many output formats.

R Markdown documents are fully reproducible. Use a productive notebook interface to weave together narrative text and code to produce elegantly formatted output. Use multiple languages including R, Python, and SQL.

Ioslides

What You’re Looking At!

HTML Report

Live demo

Distributing, Managing and Updating the Collection

Version Control

Tips

Coding Tips

  • Be consistent:
    • Values:
      • “day1” vs “day_1” vs “Day 1”
      • “5th May 1970” vs “01-05-1970” vs “05/01/1970” vs 1970/05/01
    • Variables:
      • FirstName vs first_name
      • sex vs female
  • Use variable names that a human can understand
  • Document/comment your code!

Useful Tools/Packages

  • R Studio IDE
  • Tidyverse - data wrangling and visualisation
  • repmis: Miscellaneous Tools for Reproducible Research
  • captioner: Store figure and table captions and print them later
  • devtools
  • Telethon Kids Biometrics package https://github.com/TelethonKids/biometrics